Comments

Comments are notes in your source code that aren't exectued when your code is run. These are useful for reminding yourself what your code does, and for notifying others to your intentions. Python has single line and multiline comments.


In [2]:
# this is a single line comment
print "trying out some comments In class"

"""and here
is a multi-line comment"""


print "python ftw"


trying out some comments In class
python ftw

Variables

Variables are aliases for data. This allows the developer to use the name for a particular value rather than the value it self. This makes the code more readable, and allows various optimizations to make a program run more efficiently.

In python, a variable can be named almost anything, according to the whims of the programmer. You can use any letter, the special characters "_" and every number provided you do not start with it. White spaces and signs with special meanings in Python, as "+" and "-" are not allowed. Variable names are case-sensitive. The common pattern is to separate words in variable names with underscores "_".

Variables are declared by stating the variable name and assigning to it using the "=" operator. At any time, you can reassign a value to a variable.


In [3]:
a_variable = "yay!"
another_variable = "woo!"

print a_variable

a_variable = "uh oh!" # reassigning

print a_variable

a_variable = another_variable # reassigning again

print a_variable


yay!
uh oh!
woo!

Primitive Data Types

These are the basic data types that constitute all of the more complex data structures in python.

String:

Textual data, characters and sequences of characters. Can be specified by surrounding some text with single ' or double " quotes.


In [3]:
str_1 = "hello"
str_2 = "world"
print str_1 + " " + str_2 + "!" # note that + concatenates strings

str_3 = """
this is a multiline
string!
"""

print str_3


hello world!

this is a multiline
string!

Integer:

Positive and negative whole numbers


In [4]:
int_1 = 1
int_2 = -2
int_3 = 100

print int_1 + int_2 + int_3 # for integers, + is just "plus"


99

Floating Point Numbers:

Decimal numbers, representations of fractions, and "real-valued numbers"


In [5]:
float_1 = 1.2
float_2 = -4.0
float_3 = 10.0

print float_1 + float_2 + float_3


7.2

Booleans:

Booleans represent the truth or success of a statement, and are commonly used for branching and checking status in code.


In [6]:
bool_1 = True 
bool_2 = False

print bool_1
print bool_2


True
False

Operations on Primitive Data Types

Python provides a variety of operations for performing common tasks on the primitive data types presented above. Of course, this list isn't complete, and the core functionality provided by python is greatly extended by library code, some of which will be discussed below. Note that operations can be performed either on "literal primitives" or on variables storing some primitive.

Operations on Strings

We've already seen one of the most common string operators, +, used for string concatenation. Below are some of the more commonly used string operations:

  • + : concatenate two strings
  • len(str): length of a string, number of characters
  • str.upper(): returns an uppercase version of a string
  • str.lower(): returns a lowercase version of a string
  • haystack.index(needle): searches haystack for needle, prints the position of the first occurrence, indexed from 0. Note, throws an error if needle isn't found.
  • str_1.count(str_2): counts the number of occurrences of one string in another.
  • haystack.startswith(needle): does a the haystack string start with the needle string?
  • haystack.endswith(needle): does a the haystack string end with the needle string?
  • str_1.split(str_2): split the first string at every occurrence of the second string. Outputs a list (see below).
  • ==: are the two operand strings the same?
  • str.strip(): remove any whitespace from the left or right of the string, including newlines.

A better list of string operations is available here.


In [7]:
print "concatenation:"
print str_1 + " " + str_2
print str_1 + " everybody"

print "length:"
print len(str_1)
print len(str_1 + " " + str_2)

print "string casing:"
print str_1.upper()
print "HELLO".lower()

print "string indexing:"
print "hello".index("ll")
print "hello".upper().index("LL")

print "string count:"
print str_1.count("l")
print str_1.count("ll")

print "starts with & endswith:"
print "hello".startswith("he")
print "hello".endswith("world")

print "split:"
print "practical data science".split(" ")
print "hello".split(" ")

print "equality:"
print str_1 == "hello"
print str_1 == "HELLO"


concatenation:
hello world
hello everybody
length:
5
11
string casing:
HELLO
hello
string indexing:
2
2
string count:
2
1
starts with & endswith:
True
False
split:
['practical', 'data', 'science']
['hello']
equality:
True
False

Operations on Numeric Types:

There are a bunch of common mathematical operations available on numeric types in python. If an operation is being performed on two integers, then the output will also be an integer. If one of the operands is a float, then the remaining operand will be cast into a float, and the result will likewise be a float.

  • +: plus, add two numbers
  • -: minus, subtract two numbers. If put before a single numeric value, takes the negative of that value.
  • *: multiply two numbers
  • /: divide the first operand by the second.
  • %: modulous, what is the remainder when the first number is divided by the second?

In [8]:
print "addition:"
print 1+1
print 1.5 + 1

print "subtraction:"
print 1-1
print 1-1.0

print "negation:"
x = 5
print -x

print "multiplication:"
print 3*3
print 2*2.0

print "division:"
print 5/2 # integer division!
print 5/2.0

print "modulous:"
print 5%2
print 5.5%2


addition:
2
2.5
subtraction:
0
0.0
negation:
-5
multiplication:
9
4.0
division:
2
2.5
modulous:
1
1.5

There are also a bunch of comparison operators on numeric values:

  • ==: equality of values
  • <: less than
  • <=: less than or equal to
  • >: greater than
  • >=: greater than or equal to
  • !=: not equal to, different than

These all return a boolean with a value that depends on the outcome of the comparison.


In [9]:
print "equals:"
print 1 == 2
print 1 == 1
print 1 == 1.0

print "comparison:"
print 1 > 0
print 1 > 1
print 1.0 > 1
print 1 >= 1
print 1.0 != 1


equals:
False
True
True
comparison:
True
False
False
True
False

Boolean Operations:

Frequently, one wants to combine or modify boolean values. Python has several operations for just this purpose:

  • not a: returns the opposite value of a.
  • a and b: returns true if and only if both a and b are true.
  • a or b: returns true either a or b are true, or both.

Like mathematical expressions, boolean expressions can be nested using parentheses.

String Formatting:

Often one wants to embed other information into strings, sometimes with special formatting constraints. In python, one may insert special formatting characters into strings that convey what type of data should be inserted and where, and how the "stringified" form should be formatted. For instance, one may wish to insert an integer into a string:


In [10]:
print "To be or not %d be" % 2


To be or not 2 be

Note the %d formatting (or conversion) specifier in the string. This is stating that you wish to insert an integer value (more on these conversion specifiers below). Then the value you wish to insert into the string is separated by a % character placed after the string. If you wish to insert more than one value into the string being formatted, they can be placed in a comma separated list, surrounded by parentheses after the %:


In [11]:
print "%d be or not %d be" % (2, 2)


2 be or not 2 be

In detail, a conversion specifier contains two or more characters which must occur in order with the following components:

  • The % character which marks the start of the specifier
  • An optional minimum field width. The value being read is padded to be at least this width
  • An optional precision value, given as a "." followed by the number of digits precision.
  • Conversion specifier flag specified below.

For a more detailed treatment on string formatting options, see here.

Some common conversion flag characters are:

  • d: Signed integer decimal.
  • i: Signed integer decimal.
  • e: Floating point exponential format (lowercase).
  • E: Floating point exponential format (uppercase).
  • f: Floating point decimal format.
  • c: Single character (accepts integer or single character string).
  • r: String (converts any python object using repr()).
  • s: String (converts any python object using str()).

In [12]:
print "%d %s or not %04.1f %c" % (2, "be", 2, 'b')


2 be or not 02.0 b

Data Structures

We have covered in detail much of the basics of python's primitive data types. Its now useful to consider how these basic types can be collected in ways that are meaningful and useful for a variety of tasks. Data structures are a fundamental component of programming, a collection of elements of data that adhere to certain properties, depending on the type. In these notes, we'll present three basic data structures, the list, the set, and the dictionary. Python data structures are very rich, and beyond the scope of this simple primer. Please see the documentation for a more complete view.

List:

A list, sometimes called and array or a vector is an ordered collection of values. The value of a particular element in a list is retrieved by querying for a specific index into an array. Lists allow duplicate values, but but indicies are unique. In python, like most programming languages, list indices start at 0, that is, to get the first element in a list, request the element at index 0. Lists provide very fast access to elements at specific positions, but are inefficient at "membership queries," determining if an element is in the array.

In python, lists are specified by square brackets, [ ], containing zero or more values, separated by commas. Lists are the most common data structure, and are often generated as a result of other functions, for instance, a_string.split(" ").

To query a specific value from a list, pass in the requested index into square brackets following the name of the list. Negative indices can be used to traverse the list from the right.


In [13]:
a_list = [1, 2, 3]
another_list = ["a", "b", "c"]
empty_list = []
mixed_list = [1, "a"]

print another_list[1]
print a_list[-1] # indexing from the right


b
3

Some common functionality of lists:

  • list.append(x): add an element ot the end of a list
  • list_1.extend(list_2): add all elements in the second list to the end of the first list
  • list.insert(index, x): insert element x into the list at the specified index. Elements to the right of this index are shifted over
  • list.pop(index): remove the element at the specified position
  • list.index(x): looks through the list to find the specified element, returning it's position if it's found, else throws an error
  • list.count(x): counts the number of occurrences of the input element
  • list.sort(): sorts the list of items
  • list.reverse(): reverses the order of the list

Set:

A set is a data structure where all elements are unique. Sets are unordered. In fact, the order of the elements observed when printing a set might change at different points during a programs execution, depending on the state of python's internal representation of the set. Sets are ideal for membership queries, for instance, is a user amongst those users who have received a promotion?

Sets are specified by curly braces, { }, containing one or more comma separated values. To specify an empty list, you can use the alternative construct, set().


In [14]:
some_set = {1, 2, 3, 4}
another_set = {4, 5, 6}
empty_set = set()

The easiest way to check for membership in a set is to use the in keyword, checking if a needle is "in" the haystack set.


In [15]:
print 1 in some_set
print 0 in some_set


True
False

Some other common set functionality:

  • set_a.add(x): add an element to a set
  • set_a.remove(x): remove an element from a set
  • set_a - set_b: elements in a but not in b
  • set_a | set_b: elements in a or b
  • set_a & set_b: elements in both a and b
  • set_a ^ set_b: elements in a or b but not both

Dictionaries:

Dictionaries, sometimes called dicts, maps, or, rarely, hashes are data structures containing key-value pairs. Dictionaries have a set of unique keys and are used to retrieve the value information associated with these keys. For instance, a dictionary might be used to store for each user, that user's location, or for a product id, the description associated with that product. Lookup into a dictionary is very efficient, and because these data structures are very common, they are frequently used and encountered in practice.

Dictionaries are specified by curly braces, { }, containing zero or more comma separated key-value pairs, where the keys and values are separated by a colon, :. Like a list, values for a particular key are retrieved by passing the query key into square brackets.


In [16]:
a_dict = {"a":1, "b":2, "c":3}
another_dict = {"c":5, "d":6}
empty_dict = {}
print a_dict["b"]


2

Like the set, the easiest way to check if a particular key is in a map is through the in keyword:


In [17]:
print "a" in a_dict
print "b" in another_dict


True
False

Some common operations on dictionaries:

  • dict.keys(): returns a list containing the keys of a dictionary
  • dict.values(): returns a list containing the values in a dictionary
  • dict.pop(x): removes the key and its associated value from the dictionary

Combining (Nesting) Data Structures:

There are many opportunities to combine data types in python. Lists can be populated by arbitrary data structures. Similarly, you can use any type as the value in a dictionary. However, the elements of sets, and the keys of dictionaries need to have some special properties that allow the mechanics of the data structure to determine how to store the element.

Aside: to use a particular element in a set or as a key in a dictionary, it must define a hash function, __hash__. In a nutshell, a hash function maps a data element to a number in a predefined range, based on the characteristics of that element. Because the contents of a data structure might change, so too would the value of their associated __hash__ function, causing problems for the algorithms powering sets and dictionaries.


In [18]:
print "lists of lists"
lol = [[1, 2, 3], [4, 5, 6]]
lol_2 = [[4, 5, 6], [7, 8, 9]]
print lol

print "lists of lists of lists"
lolol = [lol, lol_2]
print lolol

print "retrieving data from this data structure"
print lolol[0]
print lolol[0][0]
print lolol[0][0][0]

print "data structures as values in a dictionary"
dlol = {"lol":lol, "lol_2":lol_2}
print dlol

print "retrieving data from this dictionary"
print dlol["lol"]
print dlol["lol"][0]
print dlol["lol"][0][0]


lists of lists
[[1, 2, 3], [4, 5, 6]]
lists of lists of lists
[[[1, 2, 3], [4, 5, 6]], [[4, 5, 6], [7, 8, 9]]]
retrieving data from this data structure
[[1, 2, 3], [4, 5, 6]]
[1, 2, 3]
1
data structures as values in a dictionary
{'lol': [[1, 2, 3], [4, 5, 6]], 'lol_2': [[4, 5, 6], [7, 8, 9]]}
retrieving data from this dictionary
[[1, 2, 3], [4, 5, 6]]
[1, 2, 3]
1

Control Structures

We've spent some time going into detail about some of the data types and structures available in python. It's now time to talk about how to navigate through some of this data, and use data to make decisions. Traversing over data and making decisions based upon data are a common aspect of every programming language, known as control flow. Python provides a rich control flow, with a lot of conveniences for the power users. Here, we're just going to talk about the basics, to learn more, please consult the documentation.

A common theme throughout this discussion of control structures is the notion of a "block of code." Blocks of code are demarcated by a specific level of indentation, typically separated from the surrounding code by some control structure elements, immediately preceeded by a colon, :. We'll see examples below.

Finally, note that control structures can be nested arbitrarily, depending on the tasks you're trying to accomplish.

if Statements:

If statements are perhaps the most widely used of all control structures. An if statement consists of a code block and an argument. The if statement evaluates the boolean value of it's argument, executing the code block if that argument is true.


In [19]:
if True:
    print "duh"
    
if 1+1 == 2:
    print "easy"
    
if 2+2 == 5:
    print "really?"
    
items = {1, 2, 3}
if 2 in items:
    print "found it!"


duh
easy
found it!

Each argument in the above if statements is a boolean expression. Often you want to have alternatives, blocks of code that get evaluated in the event that the argument to an if statement is false. This is where elif (else if) and else come in.

An elif is evaluated if all preceeding if or elif arguments have evaluted to false. The else statement is the last resort, assigning the code that gets exectued if no if or elif above it is true. These statements are optional, and can be added to an if statement in any order, with at most one code block being evaluated. An else will always have it's code be exectued, if nothing above it is true.


In [20]:
if 1+2 == 2:
    print "whoa"
    x = 5+1
    print "done"
elif 1+1 == 0:
    print "that explains it"
elif 5+5 == 9:
    print "somethign"
    if "ssomething":
        print "hi"
    else:
        print "what I expected"
    
x = {1,2,3}
if 5 in x:
    print "found it"
else:
    print "didn't find it"
    x.add(5)
print x

if False:
    print "shouln't happen"
elif False:
    print "should happen"


what I expected
didn't find it
set([1, 2, 3, 5])
should happen

for Statements:

for statements are a convenient way to iterate through the values contained in a data structure. Going through the elements in a data structure one at a time, this element is assigned to variable. The code block associated with the for statement (or for loop) is then evaluated with this value.


In [21]:
set = {1, 2, 3, 4}
for foobar in set:
    print foobar
    
print "a more complex block"
for num in set:
    if num >= 3:
        print num+5

print "this also works for lists"
list = [1,2,3]
for num in list:
    if num >= 2:
        print num+5

print "dictionaries let you iterate through keys, values, or both"
dict = {"a":1, "b":2}

for k in dict.keys():
    value = dict[k]
    print k
for v in dict.values():
    
    print v
for k,v in dict.iteritems():
    if v == dict[k]:
        print "whew!"


1
2
3
4
a more complex block
8
9
this also works for lists
7
8
dictionaries let you iterate through keys, values, or both
a
b
1
2
whew!
whew!

Break and Continue:

These two statements are used to modify iteration of loops. Break is used to exit the inner most loop in which it appears. Continue the current pass through the loop, going on to the next iteration.


In [22]:
x = [1,2,3,4,5]
for num in x:
    print num
    if num > 2:
        break
        
y = ["a", "b", "c", "d"]
for letter in y:
    if letter == "a":
        continue
    print letter


1
2
3
b
c
d

In [10]:
list_of_all_custs = []
custs_with_purch = []

for cust in list_of_all_custs:
    if cust.has_purchase():
        cust_with_purhc.append(cust)
    if len(cust_with_purch) > 100:
        break


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-f1677eab2ba5> in <module>()
----> 1 list_of_all_custs

NameError: name 'list_of_all_custs' is not defined

Ranges of Integers:

Often it is convenient to define (and iterate through) ranges of integers. Python has a convenient range function that allows you to do just this.


In [23]:
print range(3) # start at zero, < the specified ceiling value
print range(-5, 5) #from the left value, < right value
print range(-5, 5, 2) #from the left value, to the middle value, incrementing by the right value

for x in range(-5, 5):
    if x > 0:
        print "%d is positive" % x


[0, 1, 2]
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
[-5, -3, -1, 1, 3]
1 is positive
2 is positive
3 is positive
4 is positive

User Defined Functions

Functions assign a name to a block of code the way variables assign names to bits of data. This seeminly benign naming of things is incredibly powerful; alloing one to reuse common functionality over and over. Well-tested functions form building blocks for large, complex systems. As you progress through python, you'll find yourself using powerful functions defined in some of python's vast libraries of code.

Function definitions begin with the def keyword, followed by the name you wish to assign to a function. Following this name are parentheses, ( ), containing zero or more variable names, those values that are passed into the function. There is then a colon, followed by a code block defining the actions of the function:


In [24]:
def print_hi():
    print "hi!"

def hi_you(name):
    print "hi %s!" % name

def square(num):
    squared = num*num
    return squared

print_hi()
hi_you("josh")
print square(100)


hi!
hi josh!
10000

Note that the fucntion square has a special keyword return. The argument to return is passed to whatever piece of code is calling the function. In this case, the square of the number that was input.

Variables set inside of functions are said to be scoped to those functions: changes, including any new variables created, are only accessible while in the function code block (with some exceptions). If "outside" variables are modified inside a function's context, the contents of that variable are first copied.

Similarly, changes or modifications to a function's arguments aren't reflected once the scope is returned; The variable will continue to point to the original thing. However, it is possible to modify the thing that is passed, assuming that it is mutable.


In [12]:
# inside a function's context, changes to a variable defined outside that
# context aren't reflected once the context is returned

name = "josh"
def do_something():
    name = "not josh"
    print "something!"

do_something()
print name

# but outside variables can be read!
def do_something_else():
    print name
do_something_else()

def do_something_new(some_name):
    some_name = "nothing"
    print some_name

do_something_new(name)
print name
    
# mutable objects can be modified
a_list = [1,2,3]
def add_sum(some_list):
    s = sum(some_list)
    some_list.append(s)
    some_list = []
    return s

tot = add_sum(a_list)
print tot
print a_list

# try again!
tot = add_sum(a_list)
print tot
print a_list


something!
josh
josh
nothing
josh
6
[1, 2, 3, 6]
12
[1, 2, 3, 6, 12]

In [26]:
# variables created in a function aren't accessible 
# outside that function's context
def do_something_new():
    thing = "123"
    print "Hi!"
do_something_new()
print thing


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-79d666b041f8> in <module>()
      5     print "Hi!"
      6 do_something_new()
----> 7 print thing

NameError: name 'thing' is not defined
Hi!

In [1]:
def times_two(input):
    input = 2*input
    return input

four = 4
print times_two(four)
print four


8
4

Files and Printing

You'll often be reading data from a file, or writing the output of your python scripts back into a file. Python makes this very easy. You need to open a file in the appropriate mode, using the open function, then you can read or write to accomplish your task. The open function takes two arguments, the name of the file, and the mode. The mode is a single letter string that specifies if you're going to be reading from a file, writing to a file, or appending to the end of an existing file. The function returns a file object that performs the various tasks you'll be performing: a_file = open(filename, mode). The modes are:

  • 'r': open a file for reading
  • 'w': open a file for writing. Caution: this will overwrite any previously existing file
  • 'a': append. Write to the end of a file.

When reading, you typically want to iterate through the lines in a file using a for loop, as above. Some other common methods for dealing with files are:

  • file.read(): read the entire contents of a file into a string
  • file.readline(): read one line of a file
  • file.write(some_string): writes to the file, note this doesn't automatically include any new lines. Also note that sometimes writes are buffered- python will wait until you have several writes pending, and perform them all at once
  • file.flush(): write out any buffered writes
  • file.close(): close the open file. This will free up some computer resources occupied by keeping a file open.
  • file.seek(position): moves to a specific position within a file. Note that position is specified in bytes.

Here is an example using files:


In [13]:
file = open("temp.txt", "w")
list = ["a", "b", "c", "d"]
set = {1, 2, 3, 4}

for x in list:
    file.write("letter: %s\n" % x)
for n in set:
    file.write("number: %d\n" % n)
file.flush()
file.close()

file_2 = open("temp.txt", "r")
for line in file_2:
    print line # note that this doesn't strip off the newlines
file_2.close()

file_3 = open("temp.txt", "r")

print file_3.read()
file_3.close()


# filter rows
file_4 = open("temp.txt", "r")
for line in file_4:
    if line.count("a") > 0:
        continue
    print line.strip() # remove the extra newline.
file_4.close()

# filter columns
file_5 = open("temp.txt", "r")
for line in file_5:
    columns = line.strip().split(" ")
    if columns[1] != "b":
        print columns # prints out the list
    print " ".join(columns) # prints it out as a string
file_5.close()


letter: a

letter: b

letter: c

letter: d

number: 1

number: 2

number: 3

number: 4

letter: a
letter: b
letter: c
letter: d
number: 1
number: 2
number: 3
number: 4

letter: b
letter: c
letter: d
number: 1
number: 2
number: 3
number: 4
['letter:', 'a']
letter: a
letter: b
['letter:', 'c']
letter: c
['letter:', 'd']
letter: d
['number:', '1']
number: 1
['number:', '2']
number: 2
['number:', '3']
number: 3
['number:', '4']
number: 4

Simple Data Processing with Python

See example problem!

Importing Libraries

One of the greatest strengths of the python programming language is its rich set of libraries- pre-written code that implements a variety of functionality. For the data scientist, python's libraries (also called "modules") are particularly valuable. With a little bit of research into the details of python's libraries, a lot of common data mining tasks are little more than a function call away. Libraries exist for doing data cleaning, analysis, visualization, machine learning and statistics.

In order to have access to a libraries functionality in a block of code, you must first import it. Importing a library tells python that while executing your code, it should not only consider the code and functions that you have written, but code and functions in the libraries that you have imported.

There are several ways to import modules in python, some have ebetter properties than others. Below we see the preferred general way to import modules. In documentation, you may see other ways to import libraries (from a_library import foo). There is no risk to just copying this pattern if it is known to work.

Imagine I want to import a library called some_python_library. This can be done using the import commands. All code below that import statement has access to the library contents.

  • import some_python_library: imports the module some_python_library, and creates a reference to that module in the current namespace. Or in other words, after you’ve run this statement, you can use some_python_library.name to refer to things defined in module some_python_library.

  • import some_python_library as plib: imports the module some_python_library and sets an alias for that library that may be easier to refer to. To refer to a thing defined in the library some_python_library, use plib.name.

In practice you'll see the second pattern used very frequently; pandas referred to as pd, numpy referred to as np, etc.


In [3]:
import math
number = 2
print math.sqrt(number)


1.41421356237

In [4]:
import math as m
print m.log(number)


0.69314718056

Example: Matplotlib

Matplotlib is one of the first python libraries a budding data scientist is likely to encounter. Matplotlib is a feature-rich plotting framework, capable of most plots you'll likely need. The interface to the matplotlib module mimics the plotting functionality in Matlab, another language and environment for scientific computing. If you're familiar with Matlab plots, matplotlib will seem very familiar. Even the plots look almost identical.

Here, we'll cover some basic functionality of matplotlib, line and bar plots and histograms. As with most content convered in this course, this is just scratching the surface. For more info, including many great examples, please consult the official matplotlib documentation. A typical pattern for me when plotting things in python is to find an example that closely mirrors what I'm trying to do, copy this, and tweak until i get things right.

Note: to get plots to appear inline in ipython notebooks, you must invoke the "magic function" %matplotlib inline. To have a stand-alone python app plot in a new window, use plt.show().

In most cases, the input to matplotlib plotting functions is arrays of numerical types, floats or integers.


In [5]:
# used to embed plots inside an ipython notebook
%matplotlib inline 
import matplotlib.pyplot as plt

# really simple example:
y = [1,2,3,4,5,4,3,2,1]
x = [1,2,3,4,5,6,7,8,9]
plt.plot(x, y)


Out[5]:
[<matplotlib.lines.Line2D at 0x10726d990>]

In [6]:
import numpy as np

X = np.linspace(0, 10, 10000)
Y = []

for x in X:
    y = math.sin(x)
    Y.append(y)
    
plt.plot(X, Y, 'r-.')
plt.title('The Sine Wave')
plt.xlabel('X')
plt.ylabel('sin(X)')


Out[6]:
<matplotlib.text.Text at 0x107270450>

Notice that most of the functionality in matplotlib that we're using is in the sub-module matplotlib.pyplot.

The third argument in the plot function is a formatting specifier. This defines some properties for a line to be displayed. Some details: Color characters:

  • b: blue

  • k: black

  • r: red

  • c: cyan

  • m: magenta

  • y: yellow

  • g: green

  • w: white

Some line/marker formatting specifiers:

  • -: solid line style

  • --: dashed line style

  • -.: dash-dot line style

  • :: dotted line style

  • .: point marker

  • ,: pixel marker

  • o: circle marker

  • +: plus marker

  • x: x marker

There are many other options for plots that can be specified. See documentation for more info.

It is possible to plot multiple plots on the same y-axis. In order to do this, the Y data passed into the plot function must be a list of lists, each with the same length as the X data that is input:


In [7]:
Y = []
for x in X:
    y = [math.sin(x), math.cos(x)]
    Y.append(y)

plt.plot(X, Y)
plt.legend(['sin(x)', 'cos(x)'])


Out[7]:
<matplotlib.legend.Legend at 0x107617dd0>

It is also possible to just plot Y data without corresponding X values. In this case, the index in the array is assumed to be X.


In [8]:
plt.plot(Y)
plt.xlabel('index')
plt.ylabel('f(x)')
plt.legend(['sin(x)', 'cos(x)'])


Out[8]:
<matplotlib.legend.Legend at 0x107795610>

Alternately, multiple calls to plot can be made with differing data. Doing so overlays the subsequent plots, creating the same effect.


In [9]:
Y = []
Z = []
for x in X:
    Y.append(math.sin(x))
    Z.append(math.cos(x))
    
plt.plot(X, Y, 'b-.')
plt.plot(X, Z, 'r--')
plt.legend(['sin(x)', 'cos(x)'])


Out[9]:
<matplotlib.legend.Legend at 0x1077a8910>

Bar plots are often a good way to compare data in categories. This is an easy matter with matplotlib, the interface is almost identical to the that used when making line plots.


In [72]:
vals = [7, 6.2, 3, 5, 9]
xval = [1, 2, 3, 4, 5]
plt.bar(xval, vals)


Out[72]:
<Container object of 5 artists>

Histograms are extremely useful for analyzing data. Histograms partition numerical data into a discrete number of buckets (called bins), and return the number of values within each bucket. Typically this is displayed as a bar plot.


In [75]:
Y = []
for x in range(0,100000):
    Y.append(np.random.randn())
    
plt.hist(Y, 50)


Out[75]:
(array([  3.00000000e+00,   7.00000000e+00,   1.20000000e+01,
         1.50000000e+01,   3.40000000e+01,   4.30000000e+01,
         5.80000000e+01,   1.10000000e+02,   1.99000000e+02,
         2.50000000e+02,   3.97000000e+02,   6.12000000e+02,
         8.20000000e+02,   1.18300000e+03,   1.57200000e+03,
         2.02100000e+03,   2.49500000e+03,   3.03000000e+03,
         3.77600000e+03,   4.27300000e+03,   4.93900000e+03,
         5.36000000e+03,   5.95400000e+03,   6.34400000e+03,
         6.47000000e+03,   6.34400000e+03,   6.19000000e+03,
         5.94100000e+03,   5.50000000e+03,   4.96300000e+03,
         4.40600000e+03,   3.75200000e+03,   3.17700000e+03,
         2.56400000e+03,   1.97100000e+03,   1.51800000e+03,
         1.15200000e+03,   8.04000000e+02,   5.77000000e+02,
         4.23000000e+02,   2.93000000e+02,   1.69000000e+02,
         1.24000000e+02,   6.20000000e+01,   4.00000000e+01,
         2.50000000e+01,   1.50000000e+01,   9.00000000e+00,
         2.00000000e+00,   2.00000000e+00]),
 array([ -4.05314674e+00,  -3.89099218e+00,  -3.72883762e+00,
        -3.56668305e+00,  -3.40452849e+00,  -3.24237393e+00,
        -3.08021937e+00,  -2.91806481e+00,  -2.75591025e+00,
        -2.59375568e+00,  -2.43160112e+00,  -2.26944656e+00,
        -2.10729200e+00,  -1.94513744e+00,  -1.78298288e+00,
        -1.62082831e+00,  -1.45867375e+00,  -1.29651919e+00,
        -1.13436463e+00,  -9.72210068e-01,  -8.10055506e-01,
        -6.47900944e-01,  -4.85746383e-01,  -3.23591821e-01,
        -1.61437259e-01,   7.17302497e-04,   1.62871864e-01,
         3.25026426e-01,   4.87180987e-01,   6.49335549e-01,
         8.11490111e-01,   9.73644672e-01,   1.13579923e+00,
         1.29795380e+00,   1.46010836e+00,   1.62226292e+00,
         1.78441748e+00,   1.94657204e+00,   2.10872660e+00,
         2.27088117e+00,   2.43303573e+00,   2.59519029e+00,
         2.75734485e+00,   2.91949941e+00,   3.08165397e+00,
         3.24380854e+00,   3.40596310e+00,   3.56811766e+00,
         3.73027222e+00,   3.89242678e+00,   4.05458134e+00]),
 <a list of 50 Patch objects>)